Introduction

R Configuration

Below we display our sessionInfo().

sessionInfo(package=NULL)
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
##  [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.10    stringi_1.1.2  
##  [9] rmarkdown_1.3   knitr_1.15.1    stringr_1.1.0   digest_0.6.11  
## [13] evaluate_0.10

ETL

For both datasets we run the relevant ETL operations. We clean the data by first removing special characters (e.g. - ~) from the column names. We then decide which columns are measures and which are dimensions.

For dimensions, we change NA to an empty string, change “&” to “and”, change “:” to “;”. We get rid of " and ’.

For measures, we change NA to 0. We get rid of all characters except for numbers and the - sign.

## [1] "Population"
## [1] "Employment"
## [1] "Unemployment"
## [1] "Unemployment.rate"
## [1] "Marginally.Food.Insecure"
## [1] "Food.Insecure"
## [1] "Very.Low.Food.Secure"
## [1] "Gross.State.Product"
## [1] "Number.of.low.income.uninsured.children"
## [1] "Percent.Low.Income.Unisured.Children"
## [1] "Personal.income"
## [1] "Workers..compensation"
## CREATE TABLE 01_Dataclean_employment (
## -- Change table_name to the table name you want.
##  state_name varchar2(4000),
##  state varchar2(4000),
##  year varchar2(4000),
##  Population number(38,4),
##  Employment number(38,4),
##  Unemployment number(38,4),
##  Unemployment.rate number(38,4),
##  Marginally.Food.Insecure number(38,4),
##  Food.Insecure number(38,4),
##  Very.Low.Food.Secure number(38,4),
##  Gross.State.Product number(38,4),
##  Number.of.low.income.uninsured.children number(38,4),
##  Percent.Low.Income.Unisured.Children number(38,4),
##  Personal.income number(38,4),
##  Workers..compensation number(38,4)
##  );
##       state_name      state       year      Population      
##  Alabama   :  5   1      :  5   2010:51   Min.   :  564516  
##  Alaska    :  5   10     :  5   2011:51   1st Qu.: 1623796  
##  Arizona   :  5   11     :  5   2012:51   Median : 4382667  
##  Arkansas  :  5   12     :  5   2013:51   Mean   : 6158836  
##  California:  5   13     :  5   2014:51   3rd Qu.: 6789176  
##  Colorado  :  5   14     :  5             Max.   :38792291  
##  (Other)   :225   (Other):225                               
##    Employment        Unemployment     Unemployment.rate
##  Min.   :  283744   Min.   :  11152   Min.   : 2.700   
##  1st Qu.:  730472   1st Qu.:  54283   1st Qu.: 6.000   
##  Median : 1877812   Median : 157581   Median : 7.300   
##  Mean   : 2796855   Mean   : 244360   Mean   : 7.368   
##  3rd Qu.: 3235551   3rd Qu.: 288906   3rd Qu.: 8.650   
##  Max.   :17348645   Max.   :2244326   Max.   :13.500   
##                                                        
##  Marginally.Food.Insecure Food.Insecure    Very.Low.Food.Secure
##  Min.   :11.71            Min.   : 7.883   Min.   :2.036       
##  1st Qu.:22.16            1st Qu.:13.051   1st Qu.:4.402       
##  Median :25.47            Median :15.394   Median :5.378       
##  Mean   :25.45            Mean   :15.362   Mean   :5.368       
##  3rd Qu.:28.50            3rd Qu.:17.266   3rd Qu.:6.159       
##  Max.   :41.08            Max.   :25.224   Max.   :9.197       
##                                                                
##  Gross.State.Product Number.of.low.income.uninsured.children
##  Min.   :  26570     Min.   :  1.00                         
##  1st Qu.:  76363     1st Qu.: 14.00                         
##  Median : 190304     Median : 44.00                         
##  Mean   : 315502     Mean   : 83.01                         
##  3rd Qu.: 404486     3rd Qu.: 85.00                         
##  Max.   :2311616     Max.   :843.00                         
##                                                             
##  Percent.Low.Income.Unisured.Children Personal.income    
##  Min.   : 0.700                       Min.   :2.561e+07  
##  1st Qu.: 3.000                       1st Qu.:6.498e+07  
##  Median : 4.100                       Median :1.664e+08  
##  Mean   : 4.756                       Mean   :2.685e+08  
##  3rd Qu.: 6.100                       3rd Qu.:3.427e+08  
##  Max.   :15.000                       Max.   :1.978e+09  
##                                                          
##  Workers..compensation
##  Min.   :   7907      
##  1st Qu.:  36316      
##  Median : 110973      
##  Mean   : 298914      
##  3rd Qu.: 246610      
##  Max.   :2443512      
## 
##     state_name state year Population Employment Unemployment
## 5   California     5 2010   37334079   16091945      2244326
## 56  California     5 2011   37700034   16258133      2156967
## 107 California     5 2012   38056055   16602672      1921121
## 158 California     5 2013   38414128   16958735      1665590
## 209 California     5 2014   38792291   17348645      1406380
##     Unemployment.rate Marginally.Food.Insecure Food.Insecure
## 5                12.2                 29.61473      18.56144
## 56               11.7                 31.61105      19.07340
## 107              10.4                 27.89361      16.48582
## 158               8.9                 25.53890      15.72165
## 209               7.5                 24.16489      13.69295
##     Very.Low.Food.Secure Gross.State.Product
## 5                5.71067             1953411
## 56               6.10045             2030468
## 107              5.79252             2125717
## 158              5.06253             2202678
## 209              4.15612             2311616
##     Number.of.low.income.uninsured.children
## 5                                       763
## 56                                      770
## 107                                     653
## 158                                     488
## 209                                     341
##     Percent.Low.Income.Unisured.Children Personal.income
## 5                                    7.8      1617134250
## 56                                   7.8      1727433579
## 107                                  6.7      1838567162
## 158                                  5.0      1861956514
## 209                                  4.0      1977923740
##     Workers..compensation
## 5                 2067143
## 56                2062255
## 107               2042670
## 158               1990609
## 209               2072792
## [1] "Population"
## [1] "Violent.crime.total"
## [1] "Murder.and.nonnegligent.Manslaughter"
## [1] "Legacy.rape..1"
## [1] "Revised.rape..2"
## [1] "Robbery"
## [1] "Aggravated.assault"
## [1] "Property.crime.total"
## [1] "Burglary"
## [1] "Larceny.theft"
## [1] "Motor.vehicle.theft"
## [1] "Violent.Crime.rate"
## [1] "Murder.and.nonnegligent.manslaughter.rate"
## [1] "Legacy.rape.rate..1"
## [1] "Revised.rape.rate..2"
## [1] "Robbery.rate"
## [1] "Aggravated.assault.rate"
## [1] "Property.crime.rate"
## [1] "Burglary.rate"
## [1] "Larceny.theft.rate"
## [1] "Motor.vehicle.theft.rate"
## CREATE TABLE 01_Dataclean_crime (
## -- Change table_name to the table name you want.
##  State varchar2(4000),
##  Year varchar2(4000),
##  Population number(38,4),
##  Violent.crime.total number(38,4),
##  Murder.and.nonnegligent.Manslaughter number(38,4),
##  Legacy.rape..1 number(38,4),
##  Revised.rape..2 number(38,4),
##  Robbery number(38,4),
##  Aggravated.assault number(38,4),
##  Property.crime.total number(38,4),
##  Burglary number(38,4),
##  Larceny.theft number(38,4),
##  Motor.vehicle.theft number(38,4),
##  Violent.Crime.rate number(38,4),
##  Murder.and.nonnegligent.manslaughter.rate number(38,4),
##  Legacy.rape.rate..1 number(38,4),
##  Revised.rape.rate..2 number(38,4),
##  Robbery.rate number(38,4),
##  Aggravated.assault.rate number(38,4),
##  Property.crime.rate number(38,4),
##  Burglary.rate number(38,4),
##  Larceny.theft.rate number(38,4),
##  Motor.vehicle.theft.rate number(38,4)
##  );

Tableau Visualizations


vis1


This is a barchart with Employment on the Columns and State, filtered by year, on the rows. The black line is the state aggregated employment average, so the bars that pass the black line show that during that year, the employment number passed the employment state average for all the years. The red and blue color coding represents whether the employment number passed the national aggregated employment average. The difference between a certain year’s employment in that state and the national average is shown as the numbers next to the bars.

vis2


The graph is a ID set, the graph shows the gross state products among all 50 states. There are two sets, one is high gross state product, and another one medium gross state product. The darker the blue dot means a higher gross state product percentage. One can see from the graph that northeastern states have higher gross state product percentages.

vis3


This is a map of all the states with their worker’s compensation vs income. The darker the color of the state means that the employees got more worker’s compensation for the amount they made. It’s interesting to note that working on the west region of the United States gives the best worker’s compensation. Working in the central region of the United States results in poor worker’s compensation for the income rate.

Shiny Application


vis4


The above shiny visulization shows the same results as the Tableau version of the barchart.

vis4


The above shiny visulization shows the same results as the Tableau version of the ID set. It colors each state based on high or low GSP.

vis4


The above shiny visulization shows the same results as the Tableau version of the map.